9 research outputs found
Explicit diversification of event aspects for temporal summarization
During major events, such as emergencies and disasters, a large volume of information is reported on newswire and social media platforms. Temporal summarization (TS) approaches are used to automatically produce concise overviews of such events by extracting text snippets from related articles over time. Current TS approaches rely on a combination of event relevance and textual novelty for snippet selection. However, for events that span multiple days, textual novelty is often a poor criterion for selecting snippets, since many snippets are textually unique but are semantically redundant or non-informative. In this article, we propose a framework for the diversification of snippets using explicit event aspects, building on recent works in search result diversification. In particular, we first propose two techniques to identify explicit aspects that a user might want to see covered in a summary for different types of event. We then extend a state-of-the-art explicit diversification framework to maximize the coverage of these aspects when selecting summary snippets for unseen events. Through experimentation over the TREC TS 2013, 2014, and 2015 datasets, we show that explicit diversification for temporal summarization significantly outperforms classical novelty-based diversification, as the use of explicit event aspects reduces the amount of redundant and off-topic snippets returned, while also increasing summary timeliness
Diversity and novelty in information retrieval
This tutorial aims to provide a unifying account of current research on diversity and novelty in different IR domains, namely, in the context of search engines, recommender systems, and data streams
Learning to rank query suggestions for adhoc and diversity search
Query suggestions have become pervasive in modern web search, as a mechanism to guide users towards a better representation of their information need. In this article, we propose a ranking approach for producing effective query suggestions. In particular, we devise a structured representation of candidate suggestions mined from a query log that leverages evidence from other queries with a common session or a common click. This enriched representation not only helps overcome data sparsity for long-tail queries, but also leads to multiple ranking criteria, which we integrate as features for learning to rank query suggestions. To validate our approach, we build upon existing efforts for web search evaluation and propose a novel framework for the quantitative assessment of query suggestion effectiveness. Thorough experiments using publicly available data from the TREC Web track show that our approach provides effective suggestions for adhoc and diversity search
About learning models with multiple query dependent features
Several questions remain unanswered by the existing literature concerning the deployment of query
dependent features within learning to rank. In this work, we investigate three research questions
to empirically ascertain best practices for learning to rank deployments: (i) Previous work in
data fusion that pre-dates learning to rank showed that while diļ¬erent retrieval systems could
be eļ¬ectively combined, the combination of multiple models within the same system was not as
eļ¬ective. In contrast, the existing learning to rank datasets (e.g. LETOR), often deploy multiple
weighting models as query dependent features within a single system, raising the question as to
whether such combination is needed. (ii) Next, we investigate whether the training of weighting
model parameters, traditionally required for eļ¬ective retrieval, is necessary within a learning to
rank context. (iii) Finally, we note that existing learning to rank datasets use weighting model
features calculated on diļ¬erent ļ¬elds (e.g. title, content or anchor text), even though such weighting models have been criticised in the literature. Experiments to address these three questions
are conducted on Web search datasets, using various weighting models as query dependent, and
typical query independent features, which are combined using three learning to rank techniques.
In particular, we show and explain why multiple weighting models should be deployed as features.
Moreover, we unexpectedly ļ¬nd that training the weighting modelās parameters degrades learned
models eļ¬ectiveness. Finally, we show that computing a weighting model separately for each ļ¬eld
is less eļ¬ective than more theoretically-sound ļ¬eld-based weighting models
Modelling efficient novelty-based search result diversification in metric spaces
AbstractNovelty-based diversification provides a way to tackle ambiguous queries by re-ranking a set of retrieved documents. Current approaches are typically greedy, requiring O(n2) documentādocument comparisons in order to diversify a ranking of n documents. In this article, we introduce a new approach for novelty-based search result diversification to reduce the overhead incurred by documentādocument comparisons. To this end, we model novelty promotion as a similarity search in a metric space, exploiting the properties of this space to efficiently identify novel documents. We investigate three different approaches: pivoting-based, clustering-based, and permutation-based. In the first two, a novel document is one that lies outside the range of a pivot or outside a cluster. In the latter, a novel document is one that has a different signature (i.e., the documentŹ¼s relative distance to a distinguished set of fixed objects called permutants) compared to previously selected documents. Thorough experiments using two TREC test collections for diversity evaluation, as well as a large sample of the query stream of a commercial search engine show that our approaches perform at least as effectively as well-known novelty-based diversification approaches in the literature, while dramatically improving their efficiency
Information retrieval on the blogosphere
Blogs have recently emerged as a new open, rapidly evolving and reactive publishing medium on the Web. Rather than managed by a central entity, the content on the blogosphere ā the collection of all blogs on the Web ā is produced by millions of independent bloggers, who can write about virtually anything. This open publishing paradigm has led to a growing mass of user-generated content on theWeb, which can vary tremendously both in format and quality when looked at in isolation, but which can also reveal interesting patterns when observed in aggregation. One field particularly interested in studying how information is produced, consumed, and searched in the blogosphere is information retrieval. In this survey, we review the published literature on searching the blogosphere. In particular, we describe the phenomenon of blogging and the motivations for searching for information on blogs. We cover both the search tasks underlying blog searchers' information needs and the most successful approaches to these tasks. These include blog post and full blog search tasks, as well as blog-aided search tasks, such as trend and market analysis. Finally, we also describe the publicly available resources that support research on searching the blogosphere